WASP: Scalable Bayes via barycenters of subset posteriors
نویسندگان
چکیده
The promise of Bayesian methods for big data sets has not fully been realized due to the lack of scalable computational algorithms. For massive data, it is necessary to store and process subsets on different machines in a distributed manner. We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set. To combine these subset posteriors, we calculate the Wasserstein barycenter via a highly efficient linear program. The resulting estimate for the Wasserstein posterior (WASP) has an atomic form, facilitating straightforward estimation of posterior summaries of functionals of interest. The WASP approach allows posterior sampling algorithms for smaller data sets to be trivially scaled to huge data. We provide theoretical justification in terms of posterior consistency and algorithm efficiency. Examples are provided in complex settings including Gaussian process regression and nonparametric Bayes mixture models.
منابع مشابه
THE EMPIRICAL BAYES METHOD OF ANALYSIS OF A SERIES OF EXPERIMENTS
The classical method of analysis of a series of experiments is somewhat involved in being conditional on various, occasionally unrealistic, assumptions such as homogeneity of variances of experimental error, lack of interactions of treatments and places,etc. In this work, we adopt a Bayesian view to account for such heterogeneities. Our appoach is illustrated by a real series of experiment...
متن کاملEmpirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals.
Problems involving thousands of null hypotheses have been addressed by estimating the local false discovery rate (LFDR). A previous LFDR approach to reporting point and interval estimates of an effect-size parameter uses an estimate of the prior distribution of the parameter conditional on the alternative hypothesis. That estimated prior is often unreliable, and yet strongly influences the post...
متن کاملGroup invariant inferred distributions via noncommutative probability
Abstract: One may consider three types of statistical inference: Bayesian, frequentist, and group invariance-based. The focus here is on the last method. We consider the Poisson and binomial distributions in detail to illustrate a group invariance method for constructing inferred distributions on parameter spaces given observed results. These inferred distributions are obtained without using Ba...
متن کاملPAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers
We propose a PAC-Bayes theorem for the sample-compression setting where each classifier is described by a compression subset of the training data and a message string of additional information. This setting, which is the appropriate one to describe many learning algorithms, strictly generalizes the usual data-independent setting where classifiers are represented only by data-independent message...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015